Introducation:
The primary objectives are to explore (1)the association between disability prevalence and age, and (2)the association between prevalence of different types of disability across the US, especially between cognitive disability and mobidity disability.
Background of the raw dataset
The data source for this program is the Disability and Health Data System (DHDS) 2018, which is an online data source providing the prevalence of adult disabilities in region/state level in the US in 2018. Prevalence data of each region/state on 6 functional disability types, including cognitive (serious difficulty concentrating, remembering or making decisions), hearing (serious difficulty hearing or deaf), mobility (serious difficulty walking or climbing stairs), vision (serious difficulty seeing), self-care (difficulty dressing or bathing) and independent living (difficulty doing errands alone),are reported by age group, race, gender, and veteran status, respectively. Please note that all datapoints in the data source are actual states/regions.
Several specific questions were addressed.
- Is the prevalence of distributed evenly across states?
- On average, what is the mean prevalence for each type of disability in different age groups?
- Within each age group, which state has the highest/lowest overall prevalence of any disability? How about cognitive disability and mobidity disability?
- Within each age group, what type of disability is the most prevalent across all the states?
- Is there an association between age and prevalence of disability?
- Is there an association between prevalence of cognitive disability and mobidity disability?
Methods:
Data Source
- The raw data was downloaded from the Center of Disease Control and Prevention(https://data.cdc.gov/Disability-Health/DHDS-Prevalence-of-Disability-Status-and-Types-by-/qjg3-6acf).
- There are 7168 rows and 31 coloumns in the raw dataset; each raw gives a piece of information on state/region level.
- The raw dataset is a long dataset, with the state- or region-level prevalence data of different type of disbility by “reponse type” (e.g. age, race, gender, veteran status) reported in each row.
- The prevalence of the 6 disability types doesn’t add up equal to the prevalence of any disability, the potential explanation is that a proportion of people might have multiple conditions.
Data Preparation
- R package “data table”, “dplyr”, and “dplyer” were mainly used to inspect and clean the data to create a final dataset for further analysis.
- To obtain a final dataset that is tailored for answering the research questions of interest, I only kept relavant rows for which the ‘response type’ were age. Some variables were renamed for easier reference.
- For future comparisons between prevalence of different types of disabilities, the dataset was reshaped from long to wide, with the prevalence values of different disabilities listed as seperate colounms for each state/region by age.
- Comparing the prevalence values of different types of disability, a new categorical variable was created to record the disability type that has the greatest prevalence of each state/region by age.
- In the final main dataset for analysis, there are 162 rows/observations (each row gives the statistics of each state/region by age groups) and 13 coloumns/variables of interest.
Exploratory Data Analysis
- In the final dataset for analysis, there is no missing data on key variabels of interest (prevalence values for different types of disability).
- The distribution of prevalence of any disease is not normally distributed. With a mean of 30.92%, most data points are concentrated between 15-25% and 40-45%. Except for hearing and mobilidity disabilities, the distribution of all other types of disability is normal. The mean of prevalence of mobility disability is the highest (16.6%) whereas it’s the lowest for self-care disability (4.365%).
- For age 18-44, the mean and median of prevalence of any disability are 18.86% and 18.7%; regardless of the type, disability is the least prevalent in DC (12.9%) whereas it’s most prevalent in Puerto Rico(29.3%). For age 45-65, the mean and median of prevalence of any disability are 29.77% and 28.1%; disability of any disease is the least prevalent in Colorado (20.6%) whereas it’s the most prevalent in Puerto Rico(53.3%). For 65+, the mean and median are 44.13% and 43%; disability of any disease is the least prevalent in Colorado (32.2%) whereas it’s the most prevalent in Puerto Rico (62.8%).
- Those preliminary results were validated with the external reports from CDC that 26% of the population in the US have some type of disability(https://www.cdc.gov/ncbddd/disabilityandhealth/infographic-disability-impacts-all.html). Unfortunately, the raw dataset doesn’t provide a way to weight the data by age, so we are unable to generate a weighted overall average of prevalence of any disability.
Results
Table of Disability Prevalence by Age (%)
| 18-44 |
18.86296 |
12.01852 |
2.580392 |
4.754717 |
3.348077 |
1.911628 |
5.454717 |
| 45-64 |
29.77407 |
12.40556 |
7.268518 |
17.700000 |
6.622222 |
5.425926 |
8.290741 |
| 65+ |
44.13148 |
10.18333 |
17.372222 |
26.959259 |
7.796296 |
5.273585 |
9.820370 |
Boxplot for disability prevalence by age
For age of 18-44, The mean (range) of the prevalence of any disability across the nation is 18.7%(12.9-29.3%) for age of 18-44, 28.1% (20.6-53.3%) for age of 45-65, and 40.3% (32.2-62.8%) for aged over 65.

Distribution of disability prevalence by age
Please note that each dot in the graph is an actual state/region; the data could be quite spread out due to the small sample size
- For the prevalence of any disability, it is normally distributed for age 18-44 and age over 65 whereas it’s not normally distributed for age of 44-65.
- The prevalence of any disability is concentrated on the left for age 18-44, with the most prevalent value being 20.6% (N=10 states/regions).
- The prevalence of any disability is concentrated in the middle for age 45-64, with the most prevalent value being 39.6% (N=10 states/regions).
- The prevalence of any disability is concentrated on the right for age over 65, with the most prevalent value being 43% (N=10 states/resgions).
Please find the interactive graphs for the distribution of cognitive and mobidity disability presented on my website.
- For age 18-44, the prevalence of cognitive disability is all under 10%, most states/regions are under 5% (N=25).
- For age 45-64,the most prevalent value being around 12.5% and around 16% (N=9 states/regions for each).
- Fpr age > 65, the prevalence of cognitive disability is concentrated between 22% to 28%, with the most prevalent value being around 27% (N=9 states/resgions).
- The distribution of prevalence of mobidity disability is more concentrated in comparative to that of cognitive disability.
- For age 18-44, the prevalence of cognitive disability is all under 10%, most states/regions are under 5% (N=25).
- For age 45-64,the most prevalent value being around 12.5% and around 16% (N=9 states/regions for each).
- The prevalence of cognitive disability is concentrated between 22% to 28%, with the most prevalent value being around 27% (N=9 states/resgions).

What is the most prevalent disability?
The most prevalent disability across the nation is cognitive disability for the young population (aged 18-44) in all states, and mobidity disability is the most prevalent for older population (please refer to the bar chart presented on my website).
Association between cognitive and mobidity disability
- A positive association between prevalence of cognitive disability and mobidity disability is observed in all age groups.
- The slope is the flattest in the younger population (aged 18-44), and it gets sharper in older population (aged over 45).

Geographic distribution of disability prevalence by age
- The prevalent of disability is not evenly distributed across the US in all ages.
- The percentage of people living with disabilities is highest in the South region in the US, especially in Kentucky, West Virginia, Mississippi. The situation in West Virginia is the most excessive, the prevalence of any disability is 25.8% for age of 18-44, 48.4% for 45-65, and 61.1% for aged 65+.
Conclusion:
- Puerto Rico is the region with the highest prevalence of any disability in all ages in 2018.
- Across all states, cognitive disability and mobidity disability are the two that are most prevalent in comparative to other types of disability, with cognitive disability more prevalent among the young (<65) and mobidity disability being more common in older population(65+).
- Overall, the prevalences of any disability, hearing disability, mobility disability, vision disability and independence disability are positively associated with aging. People are more likely to be in a stage with these types of disability as they age.
- There is a positive association between prevalence of cognitive disability and mobidity disability, but mroe information is needed to ascertain this relationship.
- Based on the calculated average prevalence by age, there is an increasing trend by age seen in the prevalence of any disability: 18.86% for 18-44, 29.77% for 45-64, and 44.13% for 65+. Similar increasing trends are observed in hearing disability, mobility disability, vision disability and independence disability. An obvious positive association between age and prevalence of any disability was osberved from the boxplots.
- As shown from the barchart, the prevalence of cognitive disability is the greatest for age of 18-44 in all states in 2018. Mobidity disability is the most prevalent for age 45-64 in all states, and it’s also the most prevalent for age 65+ in most states (the prevalence of hearing disability is the greatest in several states).
- As shown from the scatterplots, there is a positive association between prevalence of cognitive disability and mobidity disability, meaning that if the prelence of cognitive disability is high in one state the prevalence of mobidity disability is also likely to be high.The slopes for age 45-64 and 65+ are close whereas the curve is relatively flat for age of 18-44.
- The prevalent of disability is not evenly distributed across the US in all ages. The percentage of people living with disabilities is highest in the South in the US.